Spectral clustering and the high-dimensional Stochastic Block Model
نویسندگان
چکیده
Networks or graphs can easily represent a diverse set of data sources that are characterized by interacting units or actors. Social networks, representing people who communicate with each other, are one example. Communities or clusters of highly connected actors form an essential feature in the structure of several empirical networks. Spectral clustering is a popular and computationally feasible method to discover these communities. The Stochastic Block Model (Holland et al., 1983) is a social network model with well defined communities; each node is a member of one community. For a network generated from the Stochastic Block Model, we bound the number of nodes “misclustered” by spectral clustering. The asymptotic results in this paper are the first clustering results that allow the number of clusters in the model to grow with the number of nodes, hence the name high-dimensional. In order to study spectral clustering under the Stochastic Block Model, we first show that under the more general latent space model, the eigenvectors of the normalized graph Laplacian asymptotically converge to the eigenvectors of a “population” normalized graph Laplacian. Aside from the implication for spectral clustering, this provides insight into a graph visualization technique. Our method of studying the eigenvectors of random matrices is original. AMS 2000 subject classifications: Primary 62H30, 62H25; secondary 60B20.
منابع مشابه
Spectral Clustering of graphs with the Bethe Hessian
Spectral clustering is a standard approach to label nodes on a graph by studying the (largest or lowest) eigenvalues of a symmetric real matrix such as e.g. the adjacency or the Laplacian. Recently, it has been argued that using instead a more complicated, non-symmetric and higher dimensional operator, related to the non-backtracking walk on the graph, leads to improved performance in detecting...
متن کاملSpectral Clustering and Community Detection in Labeled Graphs
We study spectral clustering techniques to learn community structures in labeled random graphs where edge labels from a label set L = {1, ..., L} are drawn according to discrete probability distributions parametrized by community membership of the two end-nodes of the edge. This is a strict generalization of the standard stochastic block model for community detection.
متن کاملConsistent parameter estimation in general stochastic block models with overlaps
This paper considers the parameter estimation problem in Stochastic Block Model with Overlaps (SBMO), which is a quite general instance of random graph model allowing for overlapping community structure. We present the new algorithm successive projection overlapping clustering (SPOC) which combines the ideas of spectral clustering and geometric approach for separable non-negative matrix factori...
متن کاملCommunity Detection with the Non-Backtracking Operator
Community detection consists in identification of groups of similar items within a population. In the context of online social networks, it is a useful primitive for recommending either contacts or news items to users. We will consider a particular generative probabilistic model for the observations, namely the so-called stochastic block model and prove that the non-backtracking operator provid...
متن کاملApproximation solution of two-dimensional linear stochastic Volterra-Fredholm integral equation via two-dimensional Block-pulse functions
In this paper, a numerical efficient method based on two-dimensional block-pulse functions (BPFs) is proposed to approximate a solution of the two-dimensional linear stochastic Volterra-Fredholm integral equation. Finally the accuracy of this method will be shown by an example.
متن کامل